Site specific system for generating diversity protein sequences

ABSTRACT

This invention relates to the diversification of nucleic acid sequences by use of a nucleic acid molecule containing a region of sequence that acts as a template for diversification. The invention thus provides nucleic acid molecules to be diversified, as well as those which act as the template region (TR) and in concert with the TR for directional, site-specific diversification. Further provided are methods of preparing and using these nucleic acid sequences.

RELATED APPLICATIONS

This application claims benefit of priority from U.S. Provisional Patent Application Ser. No. 60/598,617, filed Aug. 3, 2004, which is hereby incorporated in its entirety as if fully set forth.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT

This invention was made with U.S. Government support of Grant Nos. RO1 AI38417 and AI061598, both awarded by the NIH and 1999-02298, awarded by the USDA. The U.S. Government has certain rights in this invention.

FIELD OF THE INVENTION

This invention relates to the diversification of nucleic acid sequences by use of a nucleic acid molecule containing a region of sequence that acts as a template for diversification. The invention thus provides nucleic acid molecules to be diversified, those which act as the template region (TR) for directional, site-specific diversification and for encoding necessary enzymes, and methods of preparing, as well as using them.

BACKGROUND OF THE INVENTION

Bordetella bacteriophages generate diversity in a gene that specifies host tropism for the host bacterium. This adaptation is produced by a genetic element that combines transcription, reverse transcription and integration with site-directed, adenine-specific mutagenesis. Necessary to this process is a reverse transcriptase-mediated exchange of information between two regions, one serving as a donor template region (TR) and the other as a recipient of variable sequence information, the variable region (VR).

Bordetella species that cause respiratory infections in mammals, including humans, serve as hosts for a family of bacteriophages that encode a unique diversity-generating system which allows the bacteriophage to use different receptor molecules on the bacteria for attachment and subsequent infection (Liu, M. et al. Reverse transcriptase-mediated tropism switching in Bordetella bacteriophage. Science 295, 2091-2094 (2002) and Liu, M. et al. Genomic and genetic analysis of Bordetella bacteriophages encoding reverse transcriptase-mediated tropism-switching cassettes. J. Bacteriol. 186, 1503-17 (2004)). The Bordetella cell surface is highly variable as a result of a complex program of gene expression mediated by the BvgAS phosphorelay, which regulates the organism's infectious cycle (Ackerley, B. J., Cotter P. A., & Miller, J. F. Ectopic expression of the flagellar regulon alters development of the Bordetella-host interaction. Cell 80, 611-620 (1995); Uhl, M. A. & Miller, J. F. Integration of multiple domains in a two-component sensor protein: the Bordetella pertussis BvgAS phosphorelay. EMBO J 15, 1028-1036 (1996); Cotter, P. A. & Miller, J. F. Bordetella. In Principles of Bacterial Pathogenesis. E. Groisman, Ed. Academic Press, San Diego, Calif. pp. 619-674 (2000); and Mattoo, S., Foreman-Wykert, A. K., Cotter, P. A., Miller, J. F. Mechanisms of Bordetella pathogenesis. Front Biosci 6, E168-E186 (2001)).

Bacteriophage (“phage”) BPP-1 preferentially infects virulent, Bvg+ Bordetella bacteria due to differential expression of phage receptor, pertactin (Prn), on the bacterial outer membrane (see FIG. 1 a herein and Emsley, P., Charles, I. G., Fairweather, N. F., Isaacs, N. W. Structure of the Bordetella pertussis virulence factor P.69 pertactin. Nature 381, 90-92 (1996); van den Berg, B. M., Beekhuizen, H., Willems, R. J., Mooi, F. R., van Furth, R. Role of Bordetella pertussis virulence factors in adherence to epithelial cell lines derived from the human respiratory tract. Infect Immun 67, 1056-1062 (1999); and King, A. J. et al. Role of the polymorphic region 1 of the Bordetella pertussis protein pertactin in immunity. Microbiology 147, 2885-2895 (2001)). At characteristic frequencies, BPP-1 gives rise to tropic variants (BMP and BIP) that recognize distinct surface receptors and preferentially infect avirulent, Bvg− bacteria or are indiscriminate to the Bvg status, respectively. These viral parasites have thus evolved to keep pace with the dynamic surface structure displayed by their target host as it traverses its infectious cycle.

Citation of the above documents is not intended as an admission that any of the foregoing is pertinent prior art. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicant and does not constitute any admission as to the correctness of the dates or contents of these documents.

DESCRIPTION OF THE INVENTION

The invention is based in part on the discovery that the agile tropism switching, that is switching the ability to infect specific bacteria, in Bordetella bacteriophages is mediated by a variability-generating cassette encoded in the phage genome (see FIG. 1 b herein). This cassette functions to introduce nucleotide substitutions at 23 sites in a 134 bp variable region (VR) present at the 3′ end of the mtd locus. Mtd, a putative tail protein, is necessary for phage morphogenesis and infectivity, and the sequence of VR within Mtd determines tropism (bacterial host) specificity. Binding of a BPP-1 derived GST-Mtd fusion protein to the Bordetella cell surface is dependent on expression of protein pertactin (Pm) on the outer membrane of the bacteria, correlating with the infective properties of the parental phage. The cassette shown in FIG. 1 b therefore functions to generate plasticity in a ligand-receptor interaction via site-directed mutagenesis of, and diversification within, VR sequences.

Thus in a first aspect, the invention provides for a nucleic acid molecule comprising a variable region (VR) which is operably linked to a template region (TR) wherein said TR is a template sequence that directs site-specific mutagenesis of said VR. The nucleic acid molecule may be recombinant, in the sense that it comprises nucleic acid sequences that are not found together in nature, such as sequences that are synthetic (non-naturally occurring) and/or brought together by use of molecular biology and genetic engineering techniques from heterologous sources. Alternatively, the nucleic acid molecule may be isolated, in the sense that it comprises naturally occurring sequences isolated from the surrounding biological factors or sequences with which they are found in nature.

An operable linkage between the VR and TR regions of a nucleic acid molecule of the invention refers to the ability of the TR to serve as the template for directional, site-specific mutagenesis or diversification of the sequence in the VR. Thus in one possible embodiment of the invention, a recombinant nucleic acid molecule may comprise a donor template region (TR) and a variable region (VR) that are physically attached in cis such that the TR serves as the template sequence to direct site-specific mutagenesis in the VR. The separation between the TR and VR regions may be of any distance so long as they remain operably linked. In another embodiment, the TR and VR may not be linked in cis, but the TR retains the ability to direct site specific mutagenesis of the VR. Thus the TR and VR regions may be operably linked in trans, such that the sequences of each region are present on separate nucleic acid molecules.

The invention thus also provides for a pair of nucleic acid molecules wherein a first molecule of the pair comprises a VR which is operably linked to a TR on a second molecule of the pair. As provided by the invention, the TR is a template sequence that directs site-specific mutagenesis of said VR. The nucleic acid molecules are optionally recombinant, in the sense that they may comprise nucleic acid sequences that are not found together in nature, such as sequences that are brought together by use of molecular biology and genetic engineering techniques from heterologous sources. Of course, sequences that are brought together may be synthetic (non-naturally occurring) sequences or those that are from naturally occurring sequences but isolated from the surrounding biological factors or sequences with which they are found in nature.

In embodiments of the invention wherein the VR and TR are in trans, the TR is operably linked to sequences encoding a reverse transcriptase (RT) activity as described below. As such, the VR and reverse transcriptase encoding sequence(s) are also present in trans to each other. In some embodiments, the TR and RT activity coding sequence are in cis to each other, optionally with the TR and RT coding sequence originating from the same organism. In other embodiments, the TR and RT coding sequence may be in trans to each other while remaining operably linked so that the TR still directs RT mediated changes in the operably linked VR. Of course, the TR and/or RT coding sequence may be altered as described below relative to the naturally occurring TR in the organism. Alternatively, the TR and RT coding sequence may be heterologous to each other in that they originate from, or are isolated from, different organisms, or one or the other or both are synthetic (non-naturally occurring) or synthesized (rather than isolated). Synthetic sequences include those which are derived from naturally occurring sequences.

The invention is also based in part on the discovery that sites of variability in the VR of Bordetella bacteriophages correspond to adenine residues in the generally homologous template region, TR, which itself is invariant and essential for tropism switching. The invention is also based in part on the discovery that (translationally) silent (or “synonymous”) substitutions in TR are transmitted to VR during switching, with TR supplying the raw sequence information for variability.

Thus the recombinant nucleic acid molecules of the invention include initial molecules wherein the TR region is identical to the VR, such that the adenine residues present in the TR will result in the mutagenesis or diversification of the corresponding positions in the VR sequence. Stated differently, the invention provides a recombinant nucleic acid molecule wherein the sequence of said TR is a perfect direct repeat of the sequence in said VR such that upon diversification of the VR region, one or more adenine residues in the VR, also found in the TR, will be mutated to another nucleotide, that is cytosine, thymine or guanine, without change in the TR sequence.

Alternatively, the invention provides recombinant nucleic acid molecules wherein the TR and VR regions are not identical such that as the TR region directs diversification of the VR. Such diversification may include the mutagenesis of nucleotide residues in the VR based upon the presence of corresponding adenine residues in the TR.

Without being bound by theory, and offered to improve the understanding of the invention, this ability may be mediated by a reverse transcription based mechanism in which a TR transcript serves as a template for reverse transcription during which the nucleotides incorporated opposite the adenine residues of the TR RNA transcript are randomized in the resulting single-stranded cDNA. The TR-derived, mutagenized cDNA sequence is then used to replace all or part of the VR in a process termed “mutagenic homing.” Support for this mechanism is provided by the discovery that in Bordetella bacteriophages, the brt locus, which encodes a reverse transcriptase (RT), is essential for the generation of diversity. Additional support is provided by the discovery that mutagenesis occurs exclusively at sites occupied by adenines in the TR. Artificial substitution of an adenine in the TR with another nucleotide subsequently abolishes variation at that corresponding position in the VR, while introduction of an ectopic adenine subsequently produces a novel site of heterogeneity in the VR.

Thus in a further aspect, the invention provides for the diversification of VR sequences via the presence of adenine residues in the TR operably linked to the VR. The invention provides for a nucleic acid molecule wherein the TR region contains one or more adenine residues not found in the VR, such that the adenine residues present in the TR will result in the mutagenesis or diversification of the corresponding positions in the VR sequence. Stated differently, the invention provides a recombinant nucleic acid molecule wherein the sequence of said TR is an imperfect direct repeat of the sequence in said VR due to the substitution of one or more adenine residues for one or more non-adenine residues in said VR. This may be referred to as adenine-mediated diversification.

Alternatively, as compared to the VR, the TR contains one or more insertions of adenine, optionally with the insertion of additional nucleotides to maintain the correct reading frame. As a non-limiting example, groups of three nucleotides (including one or more adenines) may be inserted in-frame into the TR in order to direct the insertion of a variable codon into the VR.

In other embodiments, the invention provides for the diversification of VR sequences via the alternation of other of nucleotide residues in the TR operably linked to the VR. As a non-limiting example, the invention provides a TR that contains a deletion of one or more codons is used to direct the deletion of corresponding codons from the operably linked VR. As another example, the TR contains an insertion of one or more codons to direct the insertion of the inserted codon(s) into the operably linked VR. The TRs of the invention also include those where the TR contains a deletion or insertion of one or more nucleotides, relative to the operably linked VR, to alter the reading frame of the VR. The deletion or insertion of nucleotides in a TR to direct deletions or insertions in an operably linked VR may be used simultaneously, such as where one portion of the TR is used to direct deletion of nucleotides while another portion of the TR is used to direct insertion of nucleotides. This may be referred to as deletion/insertion mediated diversification.

In yet additional embodiments, the invention provides for diversification based upon non-adenine substitutions of residues in the TR. Thus a nucleotide in the TR may be substituted with a non-adenine residue such that the substitution is transferred to the corresponding position in the operably linked VR. As a non-limiting example, a cytosine (C) to guanine (G) substitution in a TR can be used to result in the same C to G substitution in the operably linked VR. This may be referred to as substitution-mediated diversification.

The invention also provides for the use of adenine-mediated, deletion/insertion mediated, and/or substitution-mediated diversification in any combination to alter the sequence of a VR.

In some nucleic acid molecules of the invention, an RT encoding region, and/or an atd region (or bbp7 region), in the vicinity of the 5′ end of a TR may also be present. These regions may be present in cis relative to the TR region. Thus in embodiments of the invention wherein the VR and TR are in trans to each other, the atd region may be in trans relative to the VR. In other embodiments, the atd region is absent or substituted by a functionally analogous region of sequence, such as a promoter sequence that regulates or directs the expression of the TR region and operably linked RT encoding sequence.

As explained above, one property of the diversity-generating system of the invention is the directional transfer of sequence information which accompanies mutagenesis. Thus one TR is able to direct sequence changes in one or more operably linked VRs. Although a VR is highly variable, the operably linked TR is maintained as an uncorrupted source of sequence information including the information to retain the basic structural integrity of the VR encoded protein molecule. The invention is further based on the identification of a nucleic acid sequence designated IMH (initiation of mutagenic homing), which functions in determining the direction of the TR to VR transfer of sequence information.

In some embodiments of the invention, the IMH sequences are those located at the 3′ end of each region in Bordetella bacteriophages and which comprise a 14 bp segment consisting of G and C residues followed by a 21 bp sequence. The IMH sequences at the 3′ end of the VR differ at 5 positions from the sequences in the corresponding TR region (see FIG. 1 c herein). The invention is also based in part on the demonstration that these polymorphisms form part of a cis-acting site that determines the directionality of homing. The demonstration was made by substituting the 21 bp VR IMH sequence with the corresponding IMH-like sequence associated with the 3′ end of the TR (BPP-3′TR). The result was an elimination of tropism switching. The reverse substitution of the corresponding TR IMH-like sequence for the VR IMH sequence (BPP-3′VR) did not affect switching. Instead, the placement of VR IMH sequence at the 3′ ends of both VR and TR resulted, surprisingly, in the generation of adenine-dependent variability in TR as well as in VR (see FIG. 1 d herein), an event not previously observed in wild type phage. Variability continued to occur solely at positions occupied by adenine residues in the parental TR, indicating that the basic mechanism of mutagenesis was retained. Furthermore, the pattern of mutations observed in different BPP-3′VR phage indicated that TR was the sole source of both TR and VR variability (see (FIG. 1 d herein).

These observations demonstrate that the sequence designated as IMH helps determine the direction of transfer of sequence information from the TR to the VR. They also support the use of the corresponding TR IMH-like sequence at the 3′ end of the TR to prevent corruption of TR while the IMH directs variability to VR. Furthermore, deletion analysis indicated that in VR, the 5′ boundary of information transfer is established by the extent of homology between VR and TR.

The recombinant nucleic acid molecules of the invention may thus contain an IMH sequence located at the 3′ end of the VR and an IMH-like sequence at the end of the TR. Alternatively, the molecules may contain an IMH sequence at the end of both the VR and the TR such that the sequence of the TR may also vary to result in a “super-diversity” generating system.

In embodiments of the invention wherein a sequence of interest (or “desired VR”) to be diversified is not operably linked to the necessary TR region, an IMH sequence can be operably located at the 3′ of the desired VR followed by operable linkage to an appropriate TR with its IMH-like 3′-region. A non-limiting example of such a system is seen in the case of a desired VR which is all or part of a genomic sequence of a cell wherein insertion of an appropriate IMH and introduction of a TR containing construct with the appropriate corresponding IMH-like region, optionally with a cis linked RT coding sequence, is used to diversify the desired VR. The TR may simply be a direct repeat of the desired VR sequence to be diversified or mutagenized via the adenines present in the TR. Alternatively, the TR may contain ectopic adenines, deletions/insertions, and/or substitutions at positions corresponding to those specific sites of VR where diversity is desired. The length of homology between TR and VR can be used to functionally define the desired VR to be diversified.

The desired VR of the invention may be any nucleic acid sequence of interest for mutagenesis or diversification by use of the instant invention. In some embodiments, the sequence is all or part of a sequence encoding a binding partner of a target molecule. Target molecules may be any cellular factor or portion thereof which is of interest to a skilled person practicing the invention. Non-limiting examples include polypeptides, cell surface molecules, carbohydrates, lipids, hormones, growth or differentiation factors, cellular receptors, a ligand of a receptor, bacterial proteins or surface components, cell wall molecules, viral particles, immunity or immune tolerance factors, MHC molecules (such as Class I or II), tumor antigens found in or on tumor cells, and others as desired by a skilled practitioner and/or described herein. The binding partner (encoded at least in part by the desired VR) may be any polypeptide which, upon expression, binds to the target molecule, such as under physiological conditions or laboratory (in vivo, in vitro, or in culture) conditions.

In some embodiments of the invention, the binding partner is a bacteriocin (including a vibriocin, pyocin, or colicin), a bacteriophage protein (including a tail component that determines host specificity), capsid or surface membrane component (including those that determine physiologic, pharmacologic, or pharmaceutical properties), a ligand for a cell surface factor or an identified drug or diagnostic target molecule, or other molecules as desired and/or described herein.

Any portion, or all, of the coding region for a binding partner can be used as the desired VR. In some embodiments of the invention, however, the desired VR is the 3′ portion of said sequence encoding said binding partner. The 3′ portion of a coding sequence ends at the last codon. In other embodiments of the invention, the desired VR is located within about 50, about 100, about 150, about 200, about 250, about 300, or about 350 or more codons of the last codon in a coding sequence to be diversified. Stated differently, the desired VR may contain about 20, about 50, about 100, about 150, about 200, about 250, about 300, about 400, about 450, about 500, about 550, about 600, about 650, about 700, about 750, about 800, about 900, about 950, about 1000, about 1500, about 2000, about 2500, or about 3000 or more nucleotides from the last nucleotide of the coding region. In some embodiments, the IMH is not part of the translated portion of the VR, and as such may optionally be in an intron. Stated differently, some embodiments of the invention provide for an IMH which is transcribed, but not translated, or not transcribed or translated, while the VR and the larger sequence containing the VR may be transcribed and translated and encode a polypeptide.

In additional embodiments, the binding partner may be part of a fusion protein such that it is produced as a chimeric protein comprising another polypeptide. The other polypeptide member of the fusion protein may be heterologous to the binding partner. Alternatively, it may be another portion of the same binding partner such that the fusion protein is a recombinant molecule not found in nature.

In other embodiments, the desired VR for site specific mutagenesis is a non-translated, and optionally non-transcribed, regulatory region. The invention may be utilized to diversify such regulatory sequences to modify their function. In the case of 5′ regulatory elements, as a non-limiting example, the invention may be used to derive regulatory regions that direct expression more strongly (e.g. a stronger promoter) or less strongly (e.g. a weaker promoter). Alternatively, the regulatory regions may be diversified to increase or decrease their sensitivity to regulation (e.g. more tightly or less tightly regulated). In the case of 3′ regulatory elements, the invention may be used to derive regions that increase or decrease the stability of expressed RNA molecules. Other regulatory sequences may be similarly diversified.

As described above, the invention also provides for isolated nucleic acid molecules derived from naturally occurring sequences. Such an isolated nucleic acid molecule may be described as comprising a donor template region (TR) and a variable region (VR) wherein said TR is a template sequence operably linked to said VR to direct site specific mutagenesis of said VR. These isolated nucleic acid molecules may comprise the coding sequence containing the VR and TR as well as other components necessary to direct site specific mutagenesis of the VR in a heterologous system. Non-limiting examples of additional sequences from naturally occurring sequences are those that encode an RT activity and those that function as an IMH, to provide directionality to the transfer of sequence information from a TR to a VR, or an IMH-like sequence to prevent or reduce the frequency of changes in the TR sequence. Molecules containing these VR and TR regions with these other components are termed diversity generating retroelements (DGRs) of the invention.

These isolated nucleic acid molecules may also serve as a source of additional IMH sequences, RT coding regions, and atd regions for use in the practice of the instant invention. Non-limiting examples of isolated nucleic acid molecules include those shown in FIG. 2 herein. These include molecules isolated from Vibrio harveyi ML phage, Bifidobacterium longum, Bacteroides thetaiotaonicron, Treponema denticola, or a DGR from cyanobacteria. Non-limiting examples of such cyanobacteria include Trichodesmium erythraeum #1, Trichodesmium erythraeum #2, Nostoc PPC ssp. 7120 #1, Nostoc PPC ssp. 7120 #2, or Nostoc punctiforme. The relevant sequences illustrated in FIG. 2 are all publicly available and accessible to the skilled person.

In some embodiments, the invention provides an isolated nucleic acid molecule comprising a donor template region (TR) and an operably linked RT coding sequence. Such a molecule is preferably not from Bvg+ tropic phage-1 (BPP-1), Bvg⁻ tropic phage-1 (BMP-1), or Bvg indiscriminate phage-1 (BIP-1) bacteriophage. The isolated molecule may be from a bacteriophage, a prophage of a bacterium, a bacterium, or a spirochete.

Of course, cells comprising the nucleic acid molecules of the invention are also provided. Such cells may be prokaryotic or eukaryotic, and are capable of supporting site-specific mutagenesis as described herein. Cells that are not capable of supporting such mutagenesis may still be used to replicate nucleic acid molecules of the invention or to generate their encoded protein molecules for subsequent use. In the case of eukaryotic cells, the nucleic acids of the invention may be modified for their use in a eukaryotic environment. These modifications include the use of promoter sequences recognized by a eukaryotic RNA polymerase; the introduction of intron sequences in the TR-brt to facilitate export of RNA transcripts from nucleus to cytoplasm for translation of the brt, and the presence of a nuclear localization signal (NLS) coding sequence as part of the RT coding sequence such that the RT polypeptide contains a NLS to direct its transport to, and/or retention in, the eukaryotic nucleus. In some embodiments, the NLS is located at the N or C terminus of the RT polypeptide.

In an additional aspect, the invention provides a method of site-specific mutagenesis of a nucleic acid sequence of interest present as a VR of the invention. Such a method would comprise the use of a nucleic acid molecule as described herein wherein the VR comprises said nucleic acid sequence of interest and the TR is a direct repeat of the VR or the sequence of interest. Thus, mutagenesis will be limited to the adenine residues present in the TR. Alternatively, a non-identical TR, such as a repeat of the VR or the sequence of interest containing ectopic adenine residues, insertions, deletions, or substitutions may be used. The method would further include the expression of such nucleic molecules in a cell such that one or more nucleotide positions of the VR or sequence of interest is substituted by a different residue.

Such methods of the invention may be performed to allow more than one nucleotide position of the VR or the sequence of interest to be substituted. As noted above, the VR or sequence of interest may encode all or part (such as the 3′ portion) of a binding partner of a target molecule. These methods of the invention may, of course, be used to alter the binding properties of a binding partner such that its interaction with a target molecule will be changed. Non-limiting examples of such alternations include changing the specificity or binding affinity of a binding partner. The methods may be used to modify a particular binding partner such that it will bind a different target molecule. A non-limiting example of this aspect of the invention is the modification of a phage tropism determinant such that it will bind a heterologous bacterial surface component of interest. A bacteriophage that is made to express such a derivative would thus be infectious for a heterologous bacterium. This may be advantageously used as a means of creating phage or phage parts capable of binding to, infecting and/or killing (e.g. via lysis or dissipation of membrane potential) a particular strain of bacteria not normally affected by phage expressing the progenitor tropism determinant. The invention may also be used as a means of broadening or expanding the bacteriophage host range, or the binding range of a part or parts thereof, to include target molecules, species, or strains not commonly bound or infected by the parent phage or any phage. Another non-limiting example is modification of a sequence to restore or alter a binding or enzymatic activity, such as restoration of a phosphotransferase activity.

As described herein, site-specific mutagenesis of a known bacteriophage protein also may be practiced by the use of an isolated nucleic acid molecule containing a naturally occurring combination of VR and TR as described herein. Non-limiting examples of such molecules include those from Vibrio harveyi ML phage, Bifidobacterium longum, Bacteroides thetaiotaonicron, Treponema denticola, or a DGR from cyanobacteria. Non-limiting examples of such cyan bacteria include Trichodesmium erythraeum #1, Trichodesmium erythraeum #2, Nostoc PPC ssp. 7120 #1, Nostoc PPC ssp. 7120 #2, or Nostoc punctiforme.

In a further aspect, the invention provides a method of preparing a recombinant nucleic acid molecule as described herein by operably linking a first nucleic acid molecule comprising said VR to a second nucleic acid molecule comprising said TR such that said TR acts as a template sequence that directs site-specific mutagenesis of said VR. In the case of a linkage in cis between the VR and the TR, the first and second nucleic acid molecules would be covalently ligated together in a operative fashion as described herein. In the case of a linkage in trans, the first and second nucleic acid molecules would be placed in the same cellular environment or an in vitro reaction mix for site-specific mutagenesis in an operative fashion.

In yet another aspect, the invention provides a method of identifying additional RT coding sequences, IMH and IMH-like sequences, and corresponding TR and VR sequences. The method is based upon use of identified binding motifs of the RT activity of the invention to identify additional RT coding sequences in other organisms. The region near a putative additional RT coding sequence is then searched for nearby IMH type sequences which 1) are linked to putative TR sequences or 2) used to find VR linked IMH sequences.

The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the drawings and detailed description, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1 a to 1 d show tropism switching by Bordetella bacteriophage. In FIG. 1 a the specificities and tropism switching frequencies are depicted above the B. bronchiseptica BvgAS-mediated phase transition. BPP, BMP and BIP are tropic for Bvg+ phase, Bvg− phase or either phase, respectively. FIG. 1 b shows the components of the variability-generating cassette. The 3′ portion of mtd is expanded and the 134 bp VR sequence is underlined. Variable bases (red) correspond to adenine residues in TR. FIG. 1 c shows that in wild type (wt) BPP-1, information is transferred unidirectionally from TR to VR and is accompanied by adenine-dependent mutagenesis. BPP-3′TR fails to switch tropism, whereas BPP-3′VR switches tropism at wild type frequencies and generates variability in TR as well as VR. In FIG. 1 d, TR adenines are shown at the top followed by the corresponding nucleotides in the parental VR. TR1-9 are TR sequences derived from in vitro variability assays performed on phage BPP-3′VR. Red nucleotides show positions that varied. Sites of variability align with adenine residues in the parental TR.

FIGS. 2 a and 2 b show diversity-generating retroelements (DGRs) in bacterial and bacteriophage genomes. FIG. 2 a shows a phylogenetic tree of DGRs in relation to other classes of retroelements. GenBank accession numbers are shown. DGR, diversity generating retroelements (red lines); G2, group II introns; Rpls, mitochondrial retroplasmids; Rtn, retrons; NLTR, non-LTR elements; LTR, LTR retroelements; Telo, telomerases; PLE, Penelope-like elements. RT domains were analyzed using the neighbor-joining algorithm of PHYLIP 3.6b, with 1000 bootstrap samplings, which are expressed as a percent. DGRs form a well-defined clade with 92% bootstrap support (red lines; Brt circled in pink). Group II introns are predicted to be their closest relatives, but with very weak support (55%). FIG. 2 b shows nine putative DGRs in comparison to the Bordetella phage DGR. All DGRs include an ORF (191-888 aa) that contains a 103-190 bp VR (grey arrow) located at the C-terminus, a spacer region of 136-1,220 bp which in some cases contains a small open reading frame of similar size to atd, and a TR (black arrow) of equal length to VR in close proximity (22-339 bp) to RT (283-415 aa). For the Trichodesmium and Nostoc elements containing two VRs, VR1 and VR2 appear to have resulted from different mutagenic homing events originating from the same TR. E-values for RTs, in comparison to Brt, range from 1E-11 to 4E-37.

FIGS. 3 a-3 c show the results of multiple substitution experiments. In FIG. 3 a, TR of phage MS1 contains synonymous substitutions marked with black lines (see Example 1 herein); TR adenines are marked with red lines with adjacent sites represented by a single line. Data boxed in purple or blue schematically represent the VR sequences of nine independent tropism variants. Purple box, BPP-MS1→BMP or BIP; blue box, BMP-MS1>BPP. A black line indicates that a substitution was acquired from TR; a red line indicates that a position varied with respect to the parental VR. The frequencies of transfer of synonymous substitutions (transmission histograms) are shown at the bottom. Purple bars, BPP-MS1→BMP/BIP; blue bars, BMP-MS1→BPP. FIG. 3 b shows the results of in vitro variability assays (see Example 1 below) following selection for transfer of synonymous substitutions from TR to VR that confer resistance to MboII (position 100, boxed in purple) or AflIII (position 37, boxed in blue). Transmission histograms corresponding to the MboII selection (purple bars) or AflIII selection (blue bars) are shown at the bottom, along with positions of restriction enzyme cleavage (arrows). FIG. 3 c shows that the TR of phage MS2 contains a 1 bp deletion at position 106 which, if transferred to VR, results in a frameshift mutation in mtd and non-infectious phage (see Methods). The data boxed in purple depict VR sequences of BPP-MS2→BMP/BIP tropism variants. TR of phage MS3 contains a 1 bp deletion at position 9 which, if transferred to VR, results in non-infectious phage. The data boxed in blue show BMP-MS3→BPP tropism variants. Transmission histograms corresponding to BPP-MS2→BMP/BIP (purple bars) or BMP-MS3→BPP (blue bars) reactions. Asterisks indicate the lack of transfer of frameshift mutations that are subject to negative selection.

FIGS. 4 a and 4 b show mosaic VR sequences result from mutagenic homing. In FIG. 4 a, the average length of TR transferred under different selection conditions is shown with a histogram, and the distribution of transferred sequence lengths is depicted with bubbles (size represents the relative number of clones of a given length). Complex selections, such as those requiring a tropism switch (BPP→BMP; BMP→BPP), select for relatively rare isolates with longer stretches of transferred sequence. Simpler selections for transfer of single-nucleotide substitutions that result in restriction enzyme resistance (AflIIIs→AflIIIr; MboIIs→MboIIr) select for more abundant clones containing shorter stretches of transferred sequence, regardless of the point of selection. FIG. 4 b shows the generation of VR sequences containing random portions of TR of variable length. In the model proposed with the instant invention, reverse transcription is followed by mutagenic homing, in which a TR-derived reverse transcript integrates in a homology-dependent manner at VR forming a heteroduplex. This event could initiate at the IMH site and occur by a mechanism analogous to target-primed reverse transcription (TPRT), as proposed for group II introns (Morrish, T. A. et al. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet 31, 159-165 (2002) and Wank, H., SanFilippo, J., Singh, R. N., Matsuura, M., Lambowitz, A. M. A reverse transcriptase/maturase promotes splicing by binding at its own coding segment in a group II Intron RNA. Mol Cell 4, 239-250 (1999)). The resulting heteroduplex would contain a high density of mismatched base pairs (red asterisks) due to adenine-specific mutagenesis. The heteroduplex is then partially converted to the parental VR sequence via mismatch repair, and/or recombination. DNA replication would produce mosaic VRs with patches of TR-derived variable sequence.

FIG. 5 shows the use of in-frame deletions to define the boundaries of the BPP- 1 diversity-generating cassette. Internal in-frame deletions were introduced into phage genes flanking the brt-mtd region. A map of the BPP-1 genomic segment containing the tropism switching region is shown, along with phenotypes resulting from in-frame deletions. Viability is defined as the production of infectious phage particles following induction of lysogens with mitomycin C. Variability is defined as the production of phage DNA containing adenine mutagenized VR sequences following induction of lysogens using detected with in vitro variability assays. Phage genes bbp1, bbp2, bbp3, and bbp4 are all essential for BPP-1 viability, but unnecessary for VR variability. Phage genes brt, atd, and mtd are all necessary for VR variability in these constructions. Of these three variability cassette genes, only mtd is essential for BPP-1 viability. Phage genes bbp9 and bbp10 are not required for variability or viability. All variability determinants identified to date lie within a defined, continuous region of the phage genome, supporting the idea that the variability-generating loci function as a cassette.

FIG. 6, parts a-c, show the results of adenine-dependent mutagenesis of TR. Part a: the top sequence shows a TR with 23 naturally occurring adenines (bold) and an additional ectopic adenine residue introduced at a new site by site-specific mutagenesis followed by allelic exchange (position 55, red bold). VR1-VR5 show VR sequences from independently isolated tropism variants in which the ectopic adenine was observed to vary. The actual frequency of variability at the ectopic adenine is shown in part c. These data demonstrate that ectopic addition of an adenine residue in TR creates a new site of variability in VR. Part b: the top sequence shows a TR in which a naturally occurring adenine pair at position 23-24 has been substituted with GC (red bold). The remaining 21 naturally occurring adenines are in bold. Out of 20 independently isolated tropism variants, of which a representative 5 are shown (VR1-VR5), no variability was observed at positions 23-24. Since the frequency of alteration of the naturally occurring adenine pair at position 23-24 during tropism switching is ˜95%, the elimination of adenine residues in TR eliminates variability at the corresponding position in VR. Part c: frequencies of mutagenesis at transmitted adenines were calculated using in vitro variability assays. The frequency of mutagenesis at pairs of transmitted adenines resulting in a substitution at either position (AA→NA/NN; AA→AN/NN) or both positions (AA→NN) is shown (n=20). Mutagenesis frequencies at the single endogenous adenine at position 35 (endogenous A→N, n=50) or the ectopic adenine at position 55 (ectopic A→N, n=50) are also shown. As observed and provided within the scope of the invention, an adenine that is part of a pair is much more likely to vary than a single adenine, and the frequency of variability at the ectopic adenine at position 55 (see Part a above) is nearly identical to that for the endogenous adenine at position 35.

FIG. 7, parts a and b, show the results of internal deletion experiments. Part a: stretches of sequence were deleted from TR and VR of BPP-1 as indicated on the diagram (to scale) and the resulting strains were tested for variation in VR using in vitro variability assays. Variation in VR is indicated by “+” in the column to the right, while lack of variation is indicated by a “−”. Except for very large deletions (D118), the system was able to accommodate deletions of different size and location (Δ18, Δ39, Δ61). Most significantly, a large deletion of the 5′ portion of VR (Δ61) still displayed variation, indicating that there is no 5′ cis-acting site analogous to IMH and that homing in this system is in part based on homology. Part b: sequences of variant VRs (VR1-4) derived from Δ61 phage are aligned against TR and VR (above). The sequence between the deletion and the G/C stretch is shown. The MboII site of selection is also shown (underlined), together with mutagenesis (red) at residues corresponding to adenines in TR (bold).

FIG. 8, parts a and b, show the tropism switching frequencies of phage carrying multiple substitutions in TR. Strain abbreviations are the same as in FIG. 3. MS 1 carries 5 synonymous substitutions while MS2 and MS3 carry a 1 bp deletion in addition to synonymous substitutions (see maps in FIG. 3). Part a: multiple substitution constructs in the BMP-1 background (MS1, MS2, MS3) or wild type BMP-1 were selected for switching to the BPP tropism. Phage induced from lysogens were propagated on Bvg⁻ bacteria and the fraction of phage able to form plaques on Bvg+ was measured. Part b: multiple substitution constructs in the BPP-1 background or wild type BPP-1 were selected for switching to the BMP or BIP tropisms. Phage induced from lysogens were propagated on a Bvg+ host and the fraction of phage able to form plaques on a Bvg⁻ host was measured. In parts a. and b, the frequencies of tropism switching for MS2 and MS3 phages are lower than wild-type, indicating that a fraction of phage was eliminated by negative selection. In both cases, however, these mutant phages were able to switch tropism while avoiding the transmission of frameshift mutations (FIG. 3 c).

FIG. 9 shows the nucleotide sequence alignments of VRs and TRs from different DGRs. TR sequence is shown on top with VR sequence(s) on the bottom. Stop codons are shown in lower case. Adenines in TR are shown in bold, while the corresponding bases in VR are boldfaced only if different from TR. Note that the differences are largely limited to TR adenines, as opposed to non-adenine substitutions, indicating that the basic mechanism of mutagenesis is conserved across DGRs. Mismatches at the 3′ end, similar to IMH in Bordetella phage, are shown in color (green, VR; blue, TR). In addition, a well-conserved TCTT motif at the 3′ end, whose functional significance is unclear, is underlined. These similarities attest to likely conservation of mechanistic features, despite the lack of sequence identity between the different elements.

FIG. 10 shows schematics representing constructs of the invention. In the first construct, an atd region is present between the 3′ end of the indicated terminator and the start of the TR region. In the second construct, the atd region is present between the promoter and the indicated TR region. In the third construct, no atd or TR region is present in the construct.

FIG. 11 shows mutagenesis of VR on an induced prophage.

FIG. 12 shows an illustration of the design to mutagenize a heterologous sequence with a novel TR and IMH.

FIG. 13 shows an illustration of constructs used to mutagenize a phosphotransferase encoding sequence.

FIG. 14 shows the VR amino acid sequence used in the mutagenesis of a non-Bordetella APH(3′)-IIa encoding sequence. The large “L” delineates the location of the insertion of an amber codon at position 243 for the elimination of kanamycin binding and inactivation of kanamycin resistance.

FIG. 15 shows an alignment of sequences from various DGRs (including Cyanobacterial DGRs, and those from Nostoc punctiforme, Nostoc spp. 7120 #1 & #2, Trichodesmium Erythraeum #1 & #2 and others) of the invention.

DETAILED DESCRIPTION OF SPECIFIC MODES OF PRACTICING THE INVENTION

This invention provides nucleic acid molecules and methods for their use in site specific mutagenesis of a sequence of interest which is in whole or in part the VR in a operative linkage between the VR and a homologous repeat (TR) that directs the diversification of the sequence of interest at positions occupied by adenines within the TR. The extent of diversity that can be generated by the invention is not equal to the number of adenine positions that are capable of directing substitutions in the VR. Instead, each adenine in TR can result at that position in 3 different nucleotide substitutions in the VR, many of which will result in a substituted amino acid at the corresponding position encoded by the VR. As a non-limiting example, the presence of 23 adenine nucleotides in the practice of the invention is theoretically capable of generating over 10¹² distinct polypeptide sequences.

Thus the invention provides for the presence of up to 23 or more adenine nucleotides in a given TR of the invention to direct mutagenesis in the corresponding VR. The presence of adenine residues may be due to natural occurrence in the TR or the result of deliberate insertion or substitution into the TR as described herein. In the case of naturally occurring adenine nucleotides in the TR, mutagenesis may be allowed to occur or may be avoided by a substitution of the adenine nucleotide to a non-adenine nucleotide without changing the encoded amino acid (silent substitution). In the case of deliberate insertion or substitution, the invention provides for the introduction of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more adenine nucleotides into a TR.

As described herein, the invention provides recombinant and isolated nucleic acid molecules comprising a variable region (VR) which is operable linked to a template region (TR), wherein the VR and TR sequences are in the same molecule or separate molecules, and wherein said TR is a template sequence operably linked to said VR in order to direct site specific mutagenesis of said VR. Preferably, however, the molecule is not a derivative, containing only one or more deletion mutations, of the major tropism determinant (mtd) gene, the atd region, and/or the brt coding sequence, of Bvg+ tropic phage-1 (BPP-1) bacteriophage.

The VR and TR regions may be physically and operably linked in cis or operably linked in trans as described herein. The separation between the two regions when linked in cis can range from about 100 base pairs or less to about 1200 base pairs or more. When associated via a cis or trans configuration, expression of the TR and operably linked RT coding sequence may be under the control of an endogenous or heterologous promoters. When associated in trans, expression of the TR and operably linked RT coding sequences may be under the control of an endogenous or heterologous, regulatable promoter or promoters.

The nucleic acid molecules of the invention may also contain an RT encoding region in cis with the TR region. Non-limiting examples of RT coding sequences include those from Vibrio harveyi ML phage, Bifidobacterium longum, Bacteroides thetaiotaonicron, Treponema denticola, or a DGR from cyanobacteria, such as Trichodesmium erythrism, the genus Nostoc, or Nostoc punctiforme as provided herein. The relevant RT econding sequences from these sources are all publicly accessible and available to the skilled person. Additionally, some nucleic acid molecules may contain an atd region (or bbp7 region) immediately 5′ of the TR. Without being bound by theory, and offered to improve the understanding of the invention, the atd region is believed to participate in regulating transcription of the TR and so may be augmented by use of a heterologous promoter.

In embodiments of the invention comprising the use of a heterologous promoter, the promoter may be any that is suitable for expressing the TR and RT coding sequence under the conditions used. As a non-limiting example, when a prokaryotic cell is used with the VR and TR regions, the promoter may be any that is suitable for use in the prokaryotic cell. Non-limiting examples include the filamentous haemagglutinin promoter (fhaP), lac promoter, tac promoter, trc promoter, phoA promoter, lacUV5 promoter, and the araBAD promoter. When the conditions are those of a eukaryotic cell, non-limiting examples of promoters include the cytomegalovirus (CMV) promoter, human elongation factor-1E promoter, human ubiquitin C (UbC) promoter, SV40 early promoter; and for yeast, Gal 11 promoter and Gal 1 promoter. Of course, the VR may remain under the control of an endogenous promoter, if present, or be under the control of another heterologous promoter independently selected from those listed above or others depending on whether a prokaryotic or eukaryotic cell is used. If a cell-free system is used in the practice of the invention, then the promoter(s) will be selected based upon the source of the cellular transcription components, such as RNA polymerase, that are used.

The nucleic acid molecules of the invention may also contain an IMH sequence or a functional analog thereof. The function of the IMH has been described above, and the invention further provides for the identification, isolation, and use of additional functionally analogous sequences, whether naturally occurring or synthetic. In the case of naturally occurring functional analogs, they may be used with heterologous VR and TR sequences in the practice of the instant invention.

Non-limiting examples of IMH and IMH-like sequences for use in the practice of the invention include those shown in the following Table. An IMH or IMH-like sequence may contain the GC-rich region through the 3′ end. TABLE 1 GC-rich region (50-91% GC); length (4-31 nt) mismatches TC or TTGG . . . start (1-5) length VR 3′-end nucleotide runs (3-7 nt) (1-9 nt) IHM 3′ end BPP1 TR GCGAACA- TCGG-GGCGCGCGGCGTCTGTG (81% GC) CCCATCACC TTCTTG VR GCGTTCT- TCGG-GGCGCGCGGCGTCTGTG (21 nt) ACCACCTGA TTCTTGAGtag B. Longum TR TGGAACA- TCGG-GGGCCGC (91% GC) ATATCC G VR TGGCACC- TCGG-GGGCCGC (11 nt) CTTTCT GCGCTCGGTCGCACGAAGGCGtag Bacteriodes T. TR ACAACAA- TCGG-GCGTACGGGTTTGGG (68% GC) G TGCGTTCTTCCCAAGAAT VR ACTACTC- TCGG-GCGTGCGGGTTTGGG (19 nt) T TGCGTTCTTCCCAAGAAtag Vibrio Harveyi TR AATAGCA- TCGG-TTTTCGCCCCGCT (65% GC) CTTGA TGT VR AGTAGCA- TCGG-TTTTCGCCCCGCT (17 nt) TTCTT TGTGtaa T. denticola TR GACAACAA- TCTT-GGCTTCCGCTTGGCTTG (57% GC) TCGGCCC VR TGCAGCGA- TCTT-GGCTTCCGCCTGGCTTG (21 nt) CCGGCCT taa Trichodesmium Erythraeum #2 TR CGAGTCA- TCTCGTCTTCCCCGGTGGTTTCTGGCTTTCATTCCTAGTATTCTTC VR CGAGTCA- TCTCCTCTTCCCCGGTGGTTTCTGGCTTTCATTCCtagTATTCTTC Trichodesmium Erythraeum #1 TR CAACAATA- TTGGTTTTCGT-CTTGT-GAGTTTCCCCCCCAG (52% GC) C ACTCTT VR1 CATCAATT- TTGGTTTTCGT-CTTGT-GAGTTTCCCCCCCAG (31 nt) G ACTCTTGAAtag VR2 CGACTTTG- TTGGTTTTCGT-CTTGT-GAGTTTCCCCCCCAG G ACTCCtga Nostoc spp. 7120 #1 TR AACAATA- TTGGTTTTCGT-GTTGT-CTGCGCGTTCGGGAG (55% GC) TACTC TTCAC VR1 TACAGTT- TTGGTTTTCGT-GTTGT-CTGCGCGTTCGGGAG (31 nt) GATTC TTCAGtag VR2 TACGCTG- TTGGTTTTCGT-GTTGT-CTGCGCGTTCGGGAG GACTT TTCAGtag Nostoc Punctiforme TR AACAATA- TTGGTTTTCGT-GTTGT-CTGCGCGTTCGGGA (53% GC) TGTC TCTTCA VR1 AGCAATG- TTGGTTTTCGT-GTTGT-CTGCGCGTTCGGGA (30 nt) GGAT TCTTCAGtag VR2 AGCACTC- TTGGTTTTCGT-GTTGT-CTGCGCGTTCGGGA GGAT TCTTCAGtag Nostoc spp. 7120 #2 TR CAACGTA- --GGTTTTCGG-GTTGT-GGTTGTGCGGGGCA G VR GCGCGTG- --GGTTGTCGG-GTTGT-GGTTGTGCGGGGCA GGCT TTCTtag Chlorobium phaeobacte- roides TR AACAATA- -TCGG-TTTTCGT-GTTGT-TCGTCCCA ATCA TGCCCGTTTTATGGTGCGGTAA VR1 GGCGTTA- -TCGG-TTTTCGT-GTTGT-TCGTCCCA GTCA TCTTTTGtgaTTATCTGAT VR2 TACGGTT- -TCGG-TTTTCGT-GTTGT-TCGTCCCA GTCA TCTTTTGtgaTTATCTGATAC Pelodictyon phaeoclathrati- forme TR AACAATA- -TTGG-CTTTCGG-GTTGT-CCGTTCCA ATCAT GCCCCTTTCGATGCGTGTTAAAG VR GGCAATG- -TTGG-CTTTCGG-GTTGT-CCGTTCCA GTCCC TCTTCCtgaTCTTCTGTCTTTCT Prostheco- chloris aestuarii TR ACAACAA- TTTGGGCTTCCGG-GTTGT-GAG TACAAAG TATCGCCAGATGGGGATTGTTTAC VR1 ACGACGT- TTTGGGCTTCCGC-CTTGT-GAG GCAGCCT tagTATCCCTTGGGGTTT VR2 ACGACGA- TTTGGGCTTCCGC-CTTGT-GAG GCAGCCT tagTATCTCTTGGGGTTTTTACCA

In yet another aspect, the invention provides a method of identifying additional RT coding sequences, IMH and IMH-like sequences, TR sequences, and VR sequences. In one embodiment, the invention provides a method of identifying relevant RT coding sequences by searching sequences for the presence of one or both of a conserved nucleotide binding site motif including amino acid sequences IGXXXSQ or LGXXXSQ, where “X” represents any naturally occurring amino acid. Any suitable methodology for searching sequence information may be used. Non-limiting examples include the searching of protein sequence databases with BLAST or PSI-BLAST.

The invention also provides a method of identifying IMH sequences, said method comprising identifying an RT coding sequence in a genome of an organism, optionally as described above, search the coding strand within about 5 kb of the RT ORF and identify an IMH-like sequence containing an 18-48 nucleotide stretch of adenine-depleted DNA; and

a) use the putative IMH-like sequence to search genome-wide for a closely-related putative IMH and compare the DNA sequences located 5′ to the IMH-like and putative IMH sequences to find homologous TR and VR regions, respectively; or

b) use the sequence of the DNA located 100-350 base-pairs long 5′ to the IMH-like sequence to identify a putative TR, and use all or parts of this TR and IMH-like sequence to search genome-wide for a matching putative VR and IMH sequence.

A potential VR region may be optionally selected for further analysis if present within coding sequence(s) or putative coding sequence(s). A potential TR may be optionally selected based on location in an intergenic region near the RT coding sequence. Of course sequence alignments of potential TR and VR regions may also be used to confirm their operative linkage, especially if sequence differences occur mainly at adenines. As a non-limiting example, the sequences may be more than about 80%, more than about 85%, more than about 90%, or more than about 95% homologous, with the majority of differences being at the locations of the adenines bases in the TR. As an additional option, the identification of the TR or VR sequences may include searching or identification of sequences that are about 100 to about 350 base-pairs long or longer.

With respect to identifying the IMH-like, or IMH, sequence, searching for a conserved sequence selected from TCGG, TTTTCG, or TTGT at the 3′ ends of possible TR and VR regions may be used. FIG. 9 shows some conserved sequence patterns following the 3′-most nucleotides that vary between TR and VR pairs.

Conserved sequence patterns have been identified as following the 3′-most nucleotides that vary between TR and VR pairs. Comparison of the regions following the VR region (up to or slightly past the position of the VR-containing genes stop codons) revealed several common features, including 1) the length of the regions range from about 18 to about 44 nucleotides (average length of about 38); 2) regions had no or few adenine nucleotides; 3) nearly all (19/23) begin with a TC or TT followed by a sub-region rich in mono- and di-nucleotide runs; 4) all have one or more mismatches near the 3′ end (up to 5 mismatches in a 9 nucleotide stretch); and 5) the majority (13/23) have a TCTT motif and others (5/23) a similar motif near the 3′ end of the region. Thus IMH and IMH-like sequences of the invention may be designed to possess one or more of these features.

The above methods may be in the form of a bioinformatic algorithm to identify DGRs and IMHs. As would be recognized by the skilled person, the above methods may be embodied in the form of a computer readable medium (such as software).

As one alternative, the BPP1 brt protein sequence may be used to search for homologs in the protein database using PSI-BLAST. Brt homologs from previously identified, putative DGRs may be used for a second iteration search, and top hits may be examined further for TR and IMH-like sequences in the vicinity of the RT coding sequence. In some embodiments, genomic regions of about 2000 to 5000 bp upstream and downstream from the RT coding sequence in the genomes of organisms with closely related RT genes may be searched for direct repeats, such as for ≦ repeats of >50 nt long. Potential TR and VR regions may be identified if repeats occurred at the 3′-end of an upstream gene and in the intergenic region upstream of the RT gene. Sequence alignment of putative TR and VR regions identified putative DGRs if sequence differences occurred mainly at adenines. The 3′ ends of the putative TR and VR regions may be examined for conserved IMH and IMH-like sequence motifs as described above.

The invention further provides at least two pattern classes derived from alignments of the non-varying 3′ ends of TRs and VRs. Cyanobacterial sequences form a highly similar sub-group, while other TR/VR pairs have conserved sequence motifs at one or both the ends of the regions with dissimilar internal sequences (see FIG. 15). Stop codons were located at variable distances downstream from conserved sequence motifs in each region.

Non-limiting examples of sequences for site-specific mutagenesis according to the invention are those encoding all or part of a binding partner of a target molecule. Non-limiting examples of binding partners include amylin, THF-γ2, adrenomedullin, insulin, VEGF, PDGF, echistatin, human growth hormone, MMP, fibronectin, integrins, calmodulin, selectins, HBV proteins, HBV antigens, HBV core antigens, tryptases, proteases, mast cell protease, Src, Lyn, cyclin D, cyclin D kinase (Cdk), p16^(INK4), SH2/SH3 domains, SH3 antagonists, ras effector domain, farnesyl transferase, p21^(WAF1), Mdm2, vinculin, components of complement, C3b, C4 binding protein (C4BP), receptors, urokinase receptor, tumor necrosis factor (TNF), TNFα receptor, antibodies (Ab) and monoclonal antibodies (MAb), CTLA4 MAb, interleukins, IL-1, IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-10, IL-11, IL-12, IL-13, IL-17, interferons, LIF, OSM, CNTF, GCSF, interleukin receptors, IL-1 receptor, c-MpI, erythropoietin (EPO), the EPO receptor, T cell receptor, CD4 receptor, B cell receptor, CD30-L, CD40L, CD27L, leptin, CTLA-4, PF-4, SDF-1, M-CSF, FGF, EGF.

In some embodiments of the invention, the binding partner is a bacteriocin (including a vibriocin, pyocin, or colicin), a bacteriophage protein (including a tail component that determines host specificity), capsid or surface membrane component, a ligand for a cell surface factor or an identified drug or diagnostic target molecule.

In additional embodiments, the binding partner may be part of a fusion protein such that it is produced as a chimeric protein comprising another polypeptide. The other polypeptide member of the fusion protein may be selected from the following non-limiting list: bacteriophage tail fibers, toxins, neurotoxins, antibodies, growth factors, chemokines, cytokines, neural growth factors.

In additional embodiments, the binding partner may be a nucleic acid, part of a nucleic acid molecule, or an aptamer.

As described above, the invention also provides for isolated nucleic acid molecules derived from naturally occurring sequences. Such an isolated nucleic acid molecule may be described as comprising a donor template region (TR) and a variable region (VR) wherein said TR is a template sequence operably linked to said VR in order to direct site specific mutagenesis of said VR. Preferably, the molecule is from a bacteriophage but not from Bvg+ tropic phage-1 (BPP-1), Bvg⁻ tropic phage-1 (BMP-1), or Bvg indiscriminate phage-1.

The nucleic acid molecules of the invention may be part of a vector or a pair of vectors that is/are introduced into cells that permit site-specific mutagenesis of the VR and/or support replication of the molecules. Non-limiting examples of vectors include plasmids and virus based vectors, including vectors for phage display that may be used to express a diversified VR sequence. Other non-limiting embodiments are vectors containing VR sequences that have been subjected to the methods of the instant invention and then removed from an operably linked TR, including by preventing the expression of TR, so as to produce without further diversification quantities of the VR-encoded protein for uses including as a diagnostic, prognostic, or therapeutic product.

The instant invention also provides for a “diversified collection” of more than one VR sequence, per se or in the context of a vector, wherein at least two of the VR sequences differ from each other in sequence. In some embodiments, the difference in sequence results in the encoding of a different polypeptide by the VR sequence, but the difference may also be silent or synonymous (different codon encoding the same amino acid) and optionally used in cases where codon optimization is needed to improve expression of the encoded polypeptide. A “diverse collection” may also be referred to as a library or a plurality of VR sequences, per se or in the context of a vector. Thus the invention also provides a plurality or library of nucleic acid molecules as described herein. The plurality or library of molecules may include those wherein the VR has undergone diversification directed by the operably linked TR.

Non-limiting examples of cells that contain the nucleic acids of the invention include bacterial cells that support site-specific mutagenesis of bacteriophages as described herein or eukaryotic cells of any species origin that support mutagenesis and/or production and processing of recombinant mutagenized protein. In some embodiments, yeast or fungal cells may be used. In other embodiments, higher eukaryotic cells may be used.

Having now generally described the invention, the same will be more readily understood through reference to the following examples which are provided by way of illustration, and are not intended to be limiting of the present invention, unless specified.

EXAMPLES Example 1 Materials and Methods

Bacterial Strains, Phase and Plasmids.

B. bronchiseptica strains were derived from the sequenced RB50 strain (Uhl et al. and Parkhill, J. et al. Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet 35, 32-40 (2003)) and BPP-1 was induced from a rabbit isolate of B. bronchiseptica (Liu et al. 2002). BMP-1 was isolated from BPP-1 using the tropism switch assay (see below). Plate lysates were prepared using the soft-agar overlay method (Adams, M. H. Bacteriophages. (Interscience Publishers Inc, New York, N.Y., 1959) and tropism switch assays were performed as described previously (Liu et al. 2002). Bacterial and phage constructs were generated using allelic exchange (Edwards, R. A., Keller, L. H., Schifferli, D. M. Improved allelic exchange vectors and their use to analyze 987P fimbria gene expression. Gene 207, 149-157 (1998) and Figurski, D. H. & Helinski, D. R. Replication of an origin-containing derivative of plasmid RK2 dependent on a plasmid function provided in trans. Proc Natl Acad Sci USA 76, 1648-1652 (1979).

Multiple Substitution Constructs.

BPP-MS1 and BMP-MS1 (FIGS. 3 a and 3 b) are BPP-1 and BMP-l derivatives, respectively, containing the following synonymous substitutions in TR: T7-A (PstI), G37-T (BstXI), C55-A (XhoI), C79-G (ApaI) and G100-C (NlaIII). Each substitution generates a unique restriction site as indicated. The substitutions at positions 37 and 100 eliminate AflIII and MboII restriction sites, respectively, allowing in vitro selections for variability (FIG. 3 b). Phage MS2 (FIG. 3 c) is a Bpp-1 derivative containing a 1 bp deletion at position 106 in TR and substitutions: T7-A, G37-T, C55-A, C-79G. Phage MS3 (FIG. 2 c) is a BMP-1 derivative containing a 1 bp deletion at position 9 in TR and substitutions: G37-T, C-55A, C79-G and G100-C.

In vitro Variability Assays.

In vitro variability assays select for transfer, from TR to VR, of single nucleotide substitutions that confer resistance to restriction enzyme cleavage. Lysogens were induced with mitomycin C and VR sequences were amplified by PCR and digested with the appropriate restriction enzymes. The amplification-restriction cycle was repeated with nested primers until no further cutting was observed and the products were cloned into pBluescript KS+ vector (Stratagene) for sequencing. Variability in TR due to “self-homing” (FIG. 1 c, BPP-3′VR) was assayed using BsrI, which cleaves the parental TR but not TR sequences with adenine modifications that confer resistance. For the multiple substitution experiments in FIG. 3 b, amplification products were purified and digested with AflIII for BMP-MS1 phage or MboII for BPP-MS1 phage. In both cases, parental VR sequences are subject to cleavage whereas phage in which specific synonymous substitutions that eliminate restriction enzyme cleavage sites are transferred from TR are resistant.

Bioinformatics.

Annotated BPP-1 sequence is available under GenBank accession number AY029185. Database entries containing the conserved reverse transcriptase catalytic domain (Pfam 00078 rvt) were compiled and phylogenetic profiles were constructed using PHYLIP software package (at evolution.genetics.washington.edu/phylip.html). Entries that grouped together with Brt were searched for the presence of direct repeats proximal to the RT using REPuter program (Kurtz, S. et al. REPuter: the manifold applications of repeat analysis on a genomic scale Nucleic Acids Res. 29, 4633-2642 (2001)). Artemis software was used to collect data and facilitate annotation (Rutherford, K. et al. Artemis: sequence visualization and annotation. Bioinformatics 16:944-945 (2000)).

Example 2 Multiple Synonymous Substitutions

A genetic strategy for tracking events that give rise to sequence variants was designed based on the observation that conservative nucleotide substitutions in TR are incorporated into VRs of phages that have switched tropism. By introducing multiple synonymous substitutions positioned along TR, the portion of TR transferred during a switching event can be determined by recording the pattern of substitutions appearing in VR. Mechanistic events that underlie tropism switching can then be reconstructed from the resulting “haplotype” profiles.

Information was observed as not being transmitted evenly across VR. FIG. 3 a shows the patterns of transmission accompanying BPP→BMP/BIP or BMP→BPP tropism switching. In both cases, 3′ markers were transmitted with 100% efficiency whereas 5′ markers were transmitted at frequencies approaching 50%. Variability at adenines correlated with the transfer of proximal substitutions, while lack of variability correlated with their absence. In several cases, mosaic patterns were observed in which stretches of variable, TR-derived sequence were interrupted by non-variant, VR-derived sequence (bullets, FIG. 3 a). Together, these results argue against a simple cut and paste mechanism as commonly observed in transposition reactions (Pena, C. E., Kahlenberg, J. M., Hatfull, G. E. Assembly and activation of site-specific recombination complexes. PNAS 97, 7760-5 (2000) and Hallett B., Sherratt, D. J. Transposition and site-specific integration: adapting DNA cut-and paste mechanisms to a variety of genetic rearrangements. FEMS Microbiol Rev 21, 157-78).

Because the sequence determinants that govern receptor specificity are unclear, tropism switching assays are inherently biased by a powerful, yet poorly defined, set of selective pressures. Substitution patterns were therefore recorded using PCR-based in vitro assays that select for variability at single, precisely defined positions, with no selection for tropism switching or phage infectivity. These assays are based on the loss of restriction sites in parental VR sequences that result from the transmission of synonymous substitutions in TR.

As shown in FIG. 3 b, in vitro variability assays revealed selection-specific patterns of marker transfer in which AflIII-selected clones preferentially transferred the middle portion of TR containing the selected site (position 37), and transfer frequencies precipitously fell in either direction. MboII-selected clones transferred the 3′ end of TR which contains the selected site (position 100), but were indifferent to sequence variation at the 5′ end. In both cases, maximal frequencies of marker transfer were shifted to the exact point of selection. The majority of events displayed either interrupted patterns of transmission or patches of transmission flanked by invariant sequence (bullets, FIG. 3 b).

Despite the lack of selection for mutagenesis, all of the VR sequences in FIG. 3 b contain adenine-substitutions. To further probe the extent of plasticity, a strong negative selection against transfer of the 3′ or 5′ boundaries of TR was imposed. This was accomplished by the introduction of frameshift mutations which, if transferred, produce non-viable phage.

The system surprisingly accommodated these rather extreme selections. Both mutant phages were able to switch tropism while avoiding the transmission of frameshift mutations, generating transmission histograms that are essentially mirror images (FIGS. 3 c and FIG. 8).

Example 3 Gene Conversion

Selection at a single position, as imposed by in vitro restriction enzyme-based assays, tends to isolate shorter variable sequences centered around the point of selection. More complex selections for novel receptor specificity select for larger segments of transferred, mutagenized sequence (FIG. 4 a).

These conditions could be satisfied by a mechanism in which site-specific homing, initiated at IMH, is followed by random gene conversion due to recombination or repair. According to this model, a heteroduplex is formed at VR during the variability generating process (FIG. 4 b). See Morrish, et al. and Wank, et al. The heteroduplex would be characterized by a high density of mismatched basepairs resulting from the hybridization of VR with a TR-derived cDNA. Mismatch repair, or an analogous process, would give rise to chimeric VRs containing “patches” of sequence variation.

A consequence of the diversity-generating mechanism is that variability is introduced into the mtd locus in a highly targeted manner. Diversification exclusively occurs within the boundaries of the variable repeat, it only occurs at positions corresponding to adenine residues in TR, and it can be limited to the subset of bases that are subject to selection. This “focusing” of variability has the potential to be highly adaptive as it provides a means to efficiently respond to selective pressures while minimizing the accumulation of unnecessary or deleterious substitutions. The repair step may be essential given the high rate of adenine-mutagenesis, and it allows optimization of receptor specificity through iterative rounds of selection (Wrighton, N. C. et al. Small peptides as potent mimetics of the protein hormone erythropoietin. Science 273, 458-64 (1996) and Fairbrother, W. J. et al. Novel peptides selected to bind vascular endothelial growth factor target the receptor-binding site. Biochemistry 37, 17754-17764 (1998)).

Example 4 Related Gene Diversification Systems in Other Organisms

The ability to diversify protein domains involved in ligand-receptor interactions has extremely broad utility. The invention thus provides elements homologous to the Bordetella phage retroelement as discovered from other sources in nature. To identify related sequences, open reading frames (ORFs) of bacterial origin containing conserved RT domains were compiled. A subset clustered phylogenetically with Brt (FIG. 2 a). Adjacent sequences were examined and in all cases candidate TR and VR repeats were identified, with VRs located at the 3′ end of an ORF.

Further annotation revealed an array of cassettes which we now designate as putative diversity generating retroelements (DGRs). Although RT domains are highly related, and DGRs share an overall conservation of structural features (FIG. 4 b), there is little if any sequence similarity between other components of these related cassettes.

In every case VR analogs differ from their cognate TRs almost exclusively at positions corresponding to adenines (FIG. 9). This observation supports the use of these cassettes based on their function to generate diversity in a similar manner.

Comparison of the 3′ ends of cognate VRs and TRs also suggests the presence of analogous sequences to the Bordetella phage IMH site (FIG. 9). As shown in FIG. 2 b, DGRs are found in the chromosomes of a wide array of bacterial species and they display variations on a common theme. For example, Nostoc and Trichodesmium species contain cassettes in which a single TR apparently supplies two different VRs with sequence variability. In such cases, the VRs are part of paralogous ORFs with over 90% sequence identity and are identical except for bases corresponding to adenines in TR.

In addition, several cyanobacterial species contain multiple DGRs which are not homologous and have, therefore, been independently acquired. Although the Bordetella and V. harveyi cassettes are present on prophage genomes, there is no evidence of phage association for the remaining sequences. On the basis of the data in FIG. 2, it is proposed that DGRs have evolved to perform myriad functions in diverse organisms.

Retroelements such as group II introns (Bonen, L. & Vogel, J. The ins and outs of group II introns. Trends Genet 17, 322-331 (2001)), retrotransposons (Bushman, F. D. Targeting survival: integration site selection by retroviruses and LTR retrotransposons Cell 115, 135-138 (2003)), retroviruses (Gifford, R. & Tristem, M. The evolution, distribution and diversity of endogenous retroviruses. Virus Genes 26, 291-315 (2003)), and human LINEs (Kazazian, H. H. Jr. & Goodier, J. L. LINE drive: retrotransposition and genome instability Cell 110, 277-280 (2002)) share related characteristics.

Example 5 Elements of the DGR Which Act in cis and trans

Bordetella strain 61-11 (RB50 BPP-1 Δbrt, see FIG. 10) was used to characterize cis and trans acting elements of the DGR. The strain carries a deletion in the prophage RT gene (brt) which renders the phage unable to switch its tropism.

DNA fragments containing various components of the DGR were amplified by PCR from the intact RB50 BPP-1 lysogen, digested with restriction enzymes and cloned into the vector pBBRmcsF carrying an fha promoter. Two of the plasmids, pfhaP-atd-TR-brt and pfhaP-TR-brt, are shown in Table 1 and schematically in FIG. 10. In pfhaP-atd-TR-brt, there is no terminator between the fhaP and atd-TR-brt sequences. In pfhaP-TR-brt, there is no atd between the fha promoter and the TR sequence.

The resulting constructs are listed in Table 2. TABLE 2 Causes tropism Primers switching used for in strain plasmid: constructions: 61-11: pfhaP-atd- the atd-TR-brt region in DGR atd-TRHindIII yes TR-brt is cloned in vector pBBRmcsF for BrtBamHrev pfhaP-TR- the TR-brt region in DGR is TRHindIII for yes brt cloned in vector pBBRmcsF BrtBamHrev pfhaP-brt only the brt region in DGR BrtXbal for no is cloned in vector pBBRmcsF BrtSacl rev pfhaP-atd- the atd-TR region in DGR is no TR cloned in vector pBBRmcsF

After transformation of plasmids into strain 61-11, tropism switching was assayed by inducing lysogenic cells with mitomycin C and plating the phage lysate directly onto RB53 (a Bvg+ strain) or RB54 (a Bvg− strain) to observe plaque formation (see FIG. 11 for a representation of the use with pfhaP-atd-TR-brt).

Induced lysate from cells harboring the pfhaP-atd-TR-brt was plated directly on RB53(Bvg+) or RB54(Bvg−). Eighteen (18) plaques from RB53 plates were isolated and, after PCR amplification, their VR regions were sequenced and found to have changes at positions corresponding to adenines in the TR. From 100 μl of lysate, an average of 15 plaques were seen by plating directly on RB54 (efficiency compared to plating on RB53 is about 10⁻³).

In parallel, induced lysate from cells harboring the pfhaP-TR-brt was also directly plated on RB53(Bvg+) or RB54(Bvg−). Phages from 10 plaques from RB53 plates were isolated, their VR regions amplified by PCR and sequenced. All had changes in VR regions corresponding to adenines in the TR. From 100 μl of lysate, an average of 15 plaques were seen by plating directly on RB54 (efficiency compared to plating on RB53 is about 10⁻³) Because the VR regions of all of the phages, even those that did not switch tropism, contained nucleotide changes corresponding to adenine residues in the TR, the frequency of mutagenesis was effectively 100% with the use of a strong heterologous promoter.

Similar experiments with pfhaP-brt and pfhaP-atd-TR showed no tropism switching.

The results show that the minimal unit required for complementation of the brt deletion, restoring the ability to switch tropism, is the TR-brt region, in which:

-   -   (i) The TR acts in cis with brt     -   (ii) The TR acts in trans to the VR

The results further suggest that the trans acting construct was able to direct the mutagenesis of a proviral copy of the phage VR sequence.

Example 6 Mutagenesis in trans of an Uninduced Prophage

The ability of trans expression of TR-brt to alter the VR sequence of a (chromosomal) prophage was determined in the absence of phage induction. In the uninduced lysogen 61-11 harboring the plasmid pfhaP-atd-TR-brt, PCR amplification was performed on an overnight culture and DNA products were cloned into a sequencing vector.

In one experiment, one colony was picked and grown by overnight culture in LB medium at 37° C. The VR region of 5 μl overnight culture was PCR amplified and cloned into a sequencing vector (pBluesriptII). 20 plasmids were sequenced, with 2 (thus 10%) having changes in the VR corresponding to adenines in the TR. In another experiment, 5 colonies were picked and individually grown via overnight culture in LB medium at 37° C. The VR region of 5 μl of each overnight culture was PCR amplified and cloned into a sequencing vector (pBluesriptII). Three (3) plasmids from each plating were sequenced, with 5 of 15 (thus 30%) having changes in the VR corresponding to adenines in the TR.

Thus, fha promoter-directed transcription of the TR-brt region results in elevated levels of VR mutagenesis, demonstrating that:

(i) TR-brt transcription can be placed under the control of a heterologous promoter, replacing the need for the atd element (see below)

(ii) Control of TR-brt transcription affects the levels of VR mutagenesis

(iii) The TR-brt region can act in trans on a cognate VR in the bacterial chromosome

Example 7 Introduction of Added Sites of Mutagenesis

Using site-directed mutagenesis, 3 adenines were substituted for nucleotides 59-61 of the TR region. The corresponding VR nucleotides encoded non-variable Mtd residue A356. Using homologous recombination, the TR with 3 adenines substituted was introduced into strain 6405 (RB54 BMP-1 lysogen). Successful modification of the 6405 TR was confirmed by sequencing and restriction digestion, generating strain 6405AAA (see below).

TR—strain 6405 cgctgctgcgctattcggcggcaactggaacaacacgtcgaactcgggtt ctcgcgctGCGaactggaacaacgggccgtcgaactcgaacgcgaacatc ggggcgcgcggcgtctgtgcccatcaccttcttg

TR—strain 6405AAA cgctgctgcgctattcggcggcaactggaacaacacgtcgaactcgggtt ctcgcgctAAAaactggaacaacgggccgtcgaactcgaacgcgaacatc ggggcgcgcggcgtctgtgcccatcaccttcttg

Strain 6405AAA was induced, VR regions of the resulting phage mixture were PCR amplified and digested with a restriction enzyme (MboII) that cuts the parental VR sequence 3′ to the AAA substitution. The in vitro selection was for diversification of the parental MboII recognition sequence without assessing its effect on the encoded polypeptide. Re-amplification of VR sequences undigested by MboII followed by cloning and sequencing demonstrated that the newly introduced TR adenine residues were transmitted to VR and diversified.

Example 8 The atd is not Required for Homing Mutagenesis

Placement of a stop codon into atd does not eliminate mutagenesis. This indicates that the atd does not encode a protein required for mutagenesis.

Using site-directed mutagenesis, a stop codon was substituted for the 9th amino acid of the postulated accessory tropism determinant (atd) ORF. Using homologous recombination, the atd with a stop codon was introduced into lysogen strain 6405. Successful modification of the 6405 was confirmed by sequencing.

After induction and an additional round of propagation, phages able to plaque on either BVG+ and BVG− Bordetella bronchiseptica were isolated. Therefore, the phage maintained the ability to switch tropism. In addition, the primary induction of phage produced variants. This was shown by selecting for variants in the primary lysate using altered sensitivity to restriction digest in a restriction enzyme/PCR selection method.

Combined with the results of Example 5 above, one can conclude that an atd encoded polypeptide is not required for tropism switching and the atd sequence can be entirely substituted by a heterologous promoter.

atd—Wild type atggaacccatcgaggaagcgacaAAGtgctacgaccaaatgctcattgt ggaacggtacgaaagggttatttcgtacctgtatcccattgcgcaaagca tcccgaggaagcacggcgttgcgcgggaaatgttcctgaagtgcctgctc gggcaggtcgaattattcatcgtggcgggcaagtccaatcaggtgagcaa gctgtacgcagcggacgccgggcttgccatgctgcgattttggttgcgct ttctcgcgggcattcagaaaccgcacgctatgacgccgcatcaggtcgag acagcacaagtgctcatcgccgaagtggggcgcattctcggctcctggat tgcccgcgtgaatcgcaaagggcaggctgggaaataa

atd—with stop codon atggaacccatcgaggaagcgacaTAGtgctacgaccaaatgctcattgt ggaacggtacgaaagggttatttcgtacctgtatcccattgcgcaaagca tcccgaggaagcacggcgttgcgcgggaaatgttcctgaagtgcctgctc gggcaggtcgaattattcatcgtggcgggcaagtccaatcaggtgagcaa gctgtacgcagcggacgccgggcttgccatgctgcgattttggttgcgct ttctcgcgggcattcagaaaccgcacgctatgacgccgcatcaggtcgag acagcacaagtgctcatcgccgaagtggggcgcattctcggctcctggat tgcccgcgtgaatcgcaaagggcaggctgggaaataa

Example 9 Diversification of a Heterologous Polypeptide

A kanamycin resistance gene encoding aminoglycoside-3′-phosphotransferase-II (APH(3′)-IIa) with its own promoter was isolated from plasmid pZS24*luc using restriction enzymes SacI and XbaI and cloned into plasmid pBBRmcs (FIG. 12). The E. coli strain XL1-blue carrying this new plasmid pBBR-Kan was able to grow in presence of both kanamycin and chloramphenicol.

The amino acid sequence of APH(3′)-IIa is 264 residues long and is as follows: M I E Q D G L H A G S P A A W V E R L F G Y D W A Q Q T I G C S D A A V F R L S A Q G R P V L F V K T D L S G A L N E L Q D E A A R L S W L A T T G V P C A A V L D V V T E A G R D W L L L G E V P G Q D L L S S H L A P A E K V S I M A D A M R R L H T L D P A T C P F D H Q A K H R I E R A R T R M E A G L V D Q D D L D E E H Q G L A P A E L F A R L K A R M P D G E D L V V T H G D A C L P N I M V E N G R F S G F I D C G R L G V A D R Y Q D I A L A T R D I A E E L G G E W A D R F L V L Y G I A A P D S Q R I A F Y R L L D E F F. The Leu residue at position 243 is shown with emphasis.

A stop codon (taa) was introduced into position 243 by using site-directed mutagenesis. The mutation eliminated kanamycin resistance in a host harboring the plasmid pBBR-Kan. Plasmid pZS24*luc is from: Lutz, R. & Bujard, H. (1997) Nucleic Acids Res. 25, 1203-1210.

The kanamycin resistance gene was PCR-amplified and digested with restriction enzymes (KpnI and HindIII). The DNA fragment was placed 5′ to the atd-TR-brt region in the plasmid pfhaP-atd-TR-brt. The resulting plasmid is pKan-atd-TR-brt, which carries a deletion of the transcription terminator structure upstream of the atd. (see FIG. 13).

The designed VR region for the kanamycin resistance gene (APH(3′)-IIa includes the last 75 bp in the gene (encoding 25 residues ending with Phe) followed by a stop codon tga and 55 bp from the end of gene mtd. (see FIG. 14). The 55 bp mtd region, shown with a hypothetical encoded peptide sequence, includes 14 bp of the GC rich region (underlined in FIG. 14) followed by the IMH sequence. The mtd region was PCR-amplified with oligos carrying the flanking regions complementary to each side of the insertion position in plasmid pKan-atd-TR-brt at the 5′ end. The PCR product was purified and used as primers for a modified site-directed mutagenesis on plasmid pKan-atd-TR-brt. The resulting plasmid is pKan-IMH-atd-TR-brt (FIG. 13).

The designed TR′ region for kanamycin resistance gene is shown below in alignment with its cognate VR region. A 130 bp region corresponding to the VR is shown with the codon corresponding to Leu243 capitalized. The last 55 bp is the same as the TR region in the BPP-1 DGR region and is capitalized for emphasis. TR′ aacctcgtgaatTACggtaacgccgctcccgataagcagcgcatcgccaactatcgcctt 243amVR ttcctcgtgcttTAAggtatcgccgctcccgattcgcagcgcatcgccttctatcgcctt 243resis1VR             tac 243resis2VR             ttc TR′ cttgacaagaacttctgaTCGAACTCGAACGCGAACATCGGGGCGCGCGGCGTCTGTGCC 243amVR cttgacgagttcttctgaTCGTTCTCGTTCGCGTTCTTCGGGGCGCGCGGCGTCTGTGAC TR′ CATCACCTTCTTG 243amVR CACCTGATTCTTG

The TR′ region for the kanamycin resistance gene in plasmid pKan-IMH-atd-TR-brt was made by modified site-directed mutagenesis. The final plasmid is pKan-IMH-atd-TR′-brt (FIG. 13) or pKan-TR′. An amber stop codon was introduced into the kanamycin resistance gene at position 243 by site-directed mutagenesis to produce pKan243am-IMH-atd-TR′-brt (also referred to as pKan243-TR′).

The plasmid was transformed into lysogen 61-11. The lysogen with plasmid pKan-TR′ grew normally in the presence of kanamycin.

Selection of kanamycin resistance with pKan243-TR′ was as follows. A culture of lysogen 61-11 carrying plasmid pKan243-TR′ was grown overnight followed by serial dilution. The dilutions were plated on LB plates with 40 μg/ml kanamycin. The 61-11 hosts harboring kanamycin resistant plasmids that have “repaired” the amber stop codon by adenine-specific mutagenesis, would be expected to form colonies in the presence of kanamycin. Two robust colonies, 243resis1VR and 243resis2VR, from the plate of hosts harboring pKan243-TR′ were isolated, and, their VR regions were amplified and sequenced.

The results are as shown in the box immediately above, where 243resis1VR contained a taa to tac(Tyr) change; tac is the same codon sequence as that in TR′. This indicates that the TR′ sequence was used to substitute for the VR sequence. Stated differently, the change was the result of sequence substitution from the TR′ to the VR.

In 243resis2VR, taa was changed to ttc(Phe), the result of 2 mutations in the same codon. One of the 2 mutagenic events was an A to T change resulting from diversification of the corresponding A in TR′ while the A to C change was a substitution (or homing) from the TR′ template as seen for 243resis1VR. Phe and Tyr have very similar amino acid structures and are both hydrophilic, and the results show that a Tyr or Phe at position 243, which is Leu (also hydrophilic) in the native sequence, was able to restore kanamycin resistance. This suggests that position 243 tolerates a Leu to Tyr or Phe substitution for maintenance or restoration of phosphotransferase function.

As shown by Nurizzo et al. (J. Mol. Biol., 327:491-506, 2003), the C-terminal domain of the kanamycin resistant protein is involved in binding the kanamycin molecule. According to their published crystal structure, the L243 to amber mutation truncates the protein prior to alpha helices 7 and 8. This leads to loss of C-terminal residues 260-264, which form part of the kanamycin-binding pocket. Thus the sequence changes from a stop codon to those in 243resis1VR and 243resis2VR reflect restoration of the kanamycin binding domain of the phosphotransferase.

The above results also indicate that the IMH does not need to be translated for mutagenesis to occur because the IMH follows a tga stop codon in the above kanamycin phosphotransferase constructs. The above described results may also be performed with a trans construct which provides the TR and RT coding sequences under the control of a separate promoter on a second molecule.

Example 10 Identification of a DGR from T. denticola

Treponema denticola is a motile, anaerobic spirochete that colonizes the human oral cavity and has been associated with gum disease. There is a 134 base pair identified variable region (VR) located at the 3′ end of open reading frame TDE2269. A corresponding template region (TR) is located 199 base pairs downstream of the VR and 573 base pairs upstream of a reverse transcriptase coding sequence that bears homology (6e-39) to the Bordetella phage reverse transcriptase (brt). The VR and TR differ at 26 positions, with 23 of those differences occurring in the VR at positions that correspond to adenines within the TR. Two of the three positions that do not correspond to adenines may be a part of the IMH signal since they are the most 3′ positions of variability (see below). Also, TDE2269 has a lipoprotein signal sequence (underlined below) indicating that this protein may be exported to the outer membrane. The VR is shown in bolded text below.

TDE2269-329 Amino Acids MKNTNSKLKTKVLNRAISITALLLAAGVLLTGCPTGQGKSGGGESSEVTP NTPVDKTYTVGSVEFTMKGIAAVNAQLGHNDYSINQPHTVSLSAYLIGET EVTQELWQAVMGNNPSHFNGSPAVGETQGKRPVENVNWYQAIAFCNKLSI KLNLEPCYTVNVGGNPVDFAALSFDQIPDSNNADWDKAELDINKKGFRLP TEAEWEWAAKGGTDDKWSGTNTEAELKNYAWYGSNSGSKTHEVKKKKPNW YGLYDIAGNVAEWCWDWRADIHTGDSFPQDYPGPASGSGRVLRGGSWAGS ADYCAVGERVNISPGVRCSDLGFRLACRP

To confirm variation in the VR corresponding to adenines in the TR, the restriction enzyme HinCII was used in a variability assay to identify a T. denticola VR that differs from the sequenced VR at 25 nucleotide positions. Twenty-one of the 25 differences occur at positions that correspond to adenines within the TR, and one of the remaining four differences appears to be a direct nucleotide transfer (or homing) from the TR as shown below.

The HinCII recognition site is GTYRAC where Y is C or T; and R is A or G. TR stands for Template Region; VR stands for Variable Region; and IV stands for Identified Variant of Variable Region. A portion of presumptive IMH-like and IMH sequences of TR and VR, respectively, are shown in bold type. TR: CCGCGTCAGGCTCTAACCGTGTTAAACGCGGCGGCAGCTGGAACAACAACGCGAACAA VR: CCGCGTCAGGCTCTGGCCGTGTTTTACGCGGCGGCAGCTGGGCCGGCAGCGCGGACTA IV: ------------------------------------------A-AA-TA------GGG TR: CTGCACTGTAGGCAAACGGAATAACAACAGTCCTGACAACAGGAACAACAATCTTGGC VR: CTGCGCTGTAGGCGAACGGGTCAACATCAGTCCTGGCGTCAGGTGCAGCGATCTTGGC IV: ----A----G---ACC----GT---GG--AC------AA----G---A-CT------- TR: TTCCGCTTGGCTTGTCGGCC VR: TTCCGCCTGGCTTGCCGGCC IV: --------------------

All references cited herein are hereby incorporated by reference in their entireties, whether previously specifically incorporated or not. As used herein, the terms “a”, “an”, and “any” are each intended to include both the singular and plural forms.

Having now fully described this invention, it will be appreciated by those skilled in the art that the same can be performed within a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While this invention has been described in connection with specific embodiments thereof, it will be understood that it is capable of further modifications. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains and as may be applied to the essential features hereinbefore set forth. 

1. A single recombinant nucleic acid molecule or pair of nucleic acid molecules comprising a variable region (VR) operably linked to a donor template region (TR) wherein said TR is operably linked to a reverse transcriptase coding sequence and is a template sequence that directs site-specific mutagenesis of said VR, and wherein said single molecule is not a derivative, containing only one or more deletion mutations, of the major tropism determinant (mtd) gene, the atd region, and/or the brt coding sequence, of Bvg+ tropic phage-1 (BPP-1) bacteriophage.
 2. The molecule of claim 1, wherein the sequence of said TR is an imperfect direct repeat of the sequence in said VR due to the substitution of one or more adenine nucleotides in said TR, or substitution of one or more non-adenine nucleotides in VR by adenines in TR, or substitution of VR adenine nucleotides by non-adenine nucleotides in TR.
 3. The molecule of claim 1, wherein said VR is all or part of a sequence encoding a binding partner of a target molecule.
 4. The molecule of claim 3, further comprising all of the sequence encoding said binding partner, wherein said VR is optionally the 3′ portion of said sequence encoding said binding partner.
 5. The molecule of claim 3, wherein said binding partner binds a cell surface molecule, a hormone, a growth or differentiation factor, a receptor, a ligand of a receptor, a bacterial cell wall molecule, a viral particle, an immunity or immune tolerance factor, or an MHC molecule.
 6. The molecule of claim 3, wherein said binding partner is a bacteriocin.
 7. The molecule or pair of molecules of claim 1, wherein said TR and RT coding sequence are transcribed under the control of a heterologous promoter, such as the fha promoter.
 8. A cell containing the molecule or pair of molecules of claim
 1. 9. A method of preparing the single molecule of claim 1, said method comprising operably linking a first nucleic acid molecule comprising said VR to a second nucleic acid molecule comprising said TR such that said TR is a template sequence that directs site specific mutagenesis of said VR.
 10. A method of preparing one of the molecule or pair of molecules of claim 7, said method comprising operably linking a heterologous promoter sequence to a nucleic acid molecule comprising said TR and RT coding sequence.
 11. A method of site-specific mutagenesis of a nucleic acid sequence of interest, said method comprising obtaining a nucleic acid molecule or pair of molecules of claim 1 wherein said VR comprises said nucleic acid sequence of interest and said TR is an imperfect or perfect repeat of said sequence of interest, wherein said TR is a template sequence operably linked to said sequence of interest to direct site-specific mutagenesis of the sequence, and wherein said TR is an imperfect repeat due to the substitution of one or more adenine nucleotide for a non-adenine nucleotide in said sequence of interest or visa versa; and allowing said nucleic acid molecule to be expressed in a cell such that one or more nucleotide positions of said sequence of interest is substituted by a different nucleotide.
 12. The method of claim 11, wherein more than one nucleotide position of said sequence of interest is substituted.
 13. The method of claim 11, wherein said sequence of interest encodes all or part of a binding partner of a target molecule.
 14. The method of claim 13, wherein the binding properties of said binding partner are altered.
 15. An isolated nucleic acid molecule comprising a donor template region (TR) and an operably linked RT coding sequence wherein said molecule is not from Bvg+ tropic phage-1 (BPP-1), Bvg⁻ tropic phage-1 (BMP-1), or Bvg indiscriminate phage-1 (BIP-1) bacteriophage.
 16. The molecule of claim 15, wherein the molecule is isolated from a bacteriophage, a prophage of a bacterium, a bacterium, or a spirochete.
 17. A plurality or library of nucleic acid molecules according to claim
 1. 18. The plurality or library of claim 17, wherein the VR has undergone diversification directed by the TR.
 19. A method of identifying IMH sequences, said method comprising identifying an RT coding sequence in a genome of an organism; search the coding strand within about 5 kb of the RT ORF and identify an IMH-like sequence containing an 18-48 nucleotide stretch of adenine-depleted DNA; and a) use the putative IMH-like sequence to search genome-wide for a closely-related putative IMH and compare the DNA sequences located 5′ to the IMH-like and putative IMH sequences to find TR and VR regions, respectively; or b) use the sequence of the DNA located 100-350 base-pairs long 5′ to the IMH-like sequence to identify a putative TR, and use all or parts of this TR and IMH-like sequence to search genome-wide for a matching putative VR and IMH sequence.
 20. The method of claim 19 wherein said RT coding sequence is identified by searching for one or both amino acid sequences IGXXXSQ (SEQ ID NO:32) or LGXXXSQ (SEQ ID NO:33); or wherein the IMH-like, or IMH, sequence contain a conserved sequence selected from TCGG, TTTTCG, or TTGT; or wherein the identified TR and VR sequences can be between about 100-350 base-pairs long and should be more than about 80% homologous, with the majority of differences being at the locations of the adenines bases in the TR. 